Method and system for providing indeterminate read data latency in a memory system

ABSTRACT

A method and system for providing indeterminate read data latency in a memory system. The method includes determining if a local data packet has been received. If a local data packet has been received, then the local data packet is stored into a buffer device. The method also includes determining if the buffer device contains a data packet and determining if an upstream driver for transmitting data packets to a memory controller via an upstream channel is idle. If the buffer contains a data packet and the upstream driver is idle, then the data packet is transmitted to the upstream driver. The method further includes determining if an upstream data packet has been received. The upstream data packet is in a frame format that includes a frame start indicator and an identification tag for use by the memory controller in associating the upstream data packet with its corresponding read instruction. If an upstream data packet has been received and the upstream driver is not idle, then the upstream data packet is stored into the buffer device. If an upstream data packet has been received and the buffer device does not contain a data packet and the upstream driver is idle, then the upstream data packet is transmitted to the upstream driver. If the upstream driver is not idle, then any data packets in progress are continued being transmitted to the upstream driver.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No. 11/289,193 filed Nov. 28, 2005, the contents of which are incorporated by reference herein in their entirety.

BACKGROUND OF THE INVENTION

This invention relates to memory systems comprised of hub devices connected to a memory controller by a daisy chained channel. The hub devices are attached to, or reside upon, memory modules that contain memory devices. More particularly, this invention relates to the flow control of read data and the identification of read data returned to the controller by each hub device.

Many high performance computing main memory systems use multiple fully buffered memory modules connected to a memory controller by one or more channels. The memory modules contain a hub device and multiple memory devices. The hub device fully buffers command, address and data signals between the memory controller and the memory devices. The flow of read data is controlled using either a leveled latency or position dependant latency technique. In both cases, the memory controller is able to predict the return time of read data requested from the memory modules and to schedule commands to avoid collisions as read data is merged onto the controller interface by each memory module.

In some cases, the memory controller is able to issue a read data delay adder along with the read command. This instructs the targeted hub device to add additional delay to the return of read data in order to simplify the issuing of commands and to avoid collisions. In all cases, the read data must be returned in the order in which it was requested. Further, the total read data latency must be completely predictable by the memory controller. During run time operations, these two restrictions result in additional gaps being added to packets of read data that are returned from the memory modules. This adds latency to the average read operation. In addition, hubs are not able to use indeterminate techniques to return read data faster or slower than normal. These techniques include, but are not limited to, caching read data locally, reading memory devices speculatively, independently managing memory device address pages, data compression, etc.

To optimize average read data latency under real workload conditions, and to enable advanced hub device capabilities, what is needed is a way to allow memory modules to return read data to the memory controller at an unpredicted time. This must be done in a way that does not corrupt read data and that allows the memory controller to identify each read data packet. Preventing data corruption by avoiding data collisions is especially complicated as hub devices merge local read data onto a cascaded memory controller channel.

BRIEF SUMMARY OF THE INVENTION

Exemplary embodiments include a method for providing indeterminate read data latency. The method includes determining if a local data packet has been received. If a local data packet has been received, then the local data packet is stored into a buffer device. The method also includes determining if the buffer device contains a data packet and determining if an upstream driver for transmitting data packets to a memory controller via an upstream channel is idle. If the buffer contains a data packet and the upstream driver is idle, then the data packet is transmitted to the upstream driver. The method further includes determining if an upstream data packet has been received. The upstream data packet is in a frame format that includes a frame start indicator and an identification tag for use by the memory controller in associating the upstream data packet with its corresponding read instruction. If an upstream data packet has been received and the upstream driver is not idle, then the upstream data packet is stored into the buffer device. If an upstream data packet has been received and the buffer device does not contain a data packet and the upstream driver is idle, then the upstream data packet is transmitted to the upstream driver. If the upstream driver is not idle, then any data packets in progress are continued being transmitted to the upstream driver.

Exemplary embodiments include a hub device in a memory system. The hub device includes a device for receiving data packets, an upstream driver for transmitting data packets to a memory controller via an upstream channel and a mechanism including instructions for facilitating indeterminate read data latency. The device for receiving data packets includes an upstream receiver for receiving upstream data packets from a downstream hub device and a memory interface for receiving local data packets from a local storage device. Each data packet is in a frame format that includes a frame start indicator and an identification tag for use by a memory controller in associating the data packet with its corresponding read instruction. The instructions on the mechanism facilitate determining if a local data packet has been received. If a local data packet has been received, then the local data packet is stored into a buffer device. The instructions also facilitate determining if the buffer device contains a data packet and determining if the upstream driver is idle. If the buffer contains a data packet and the upstream driver is idle, then the data packet is transmitted to the upstream driver. The instructions further facilitate determining if an upstream data packet has been received. If an upstream data packet has been received and the upstream driver is not idle, then the upstream data packet is stored into the buffer device. If an upstream data packet has been received and the buffer device does not contain a data packet and the upstream driver is idle, then the upstream data packet is transmitted to the upstream driver. If the upstream driver is not idle, then any data packets in progress are continued being transmitted to the upstream driver.

Exemplary embodiments include a memory subsystem with one or more memory modules. The memory modules include one or more memory devices connected to a memory controller by a daisy chained channel. The read data is returned to the memory controller using a frame format that includes an identification tag and frame start indicator. The memory system also includes one or more hub devices on the memory modules for buffering address, commands and data. The hub devices include controller channel buffers that are used in conjunction with a preemptive local data merge algorithm to minimize read data latency and enable indeterminate read data return times to the memory controller.

Further exemplary embodiments include a memory system with one or more memory modules. The memory modules include memory devices that are connected to a memory controller by a daisy chained channel. The read data is returned to the memory controller using a frame format that includes an identification tag and frame start indicator. The memory system also includes one or more hub devices connected to the memory modules for buffering address, commands and data. The hub devices include controller channel buffers that are used in conjunction with a preemptive local data merge algorithm to minimize read data latency and enable indeterminate read data return times to the memory controller.

BRIEF DESCRIPTION OF THE DRAWINGS

Referring now to the drawings wherein like elements are numbered alike in the several FIGURES:

FIG. 1 depicts an exemplary memory system with multiple levels of daisy chained memory modules with point-to-point connections;

FIG. 2 depicts an exemplary memory system with hub devices that are connected to a memory modules and to a memory controller by a daisy chained channel;

FIG. 3 depicts a hub logic device that may be utilized by exemplary embodiments;

FIG. 4 is a exemplary process flow implemented by the hub logic device in exemplary embodiments; and

FIG. 5 is a read data format that may be utilized by exemplary embodiments.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

Exemplary embodiments utilize controller channel buffers (CCBs), read data frame formats with identification tags and a preemptive data merge technique to enable minimized and indeterminate read data latency. Exemplary embodiments allow memory modules to return read data to a memory controller at an unpredicted time. Identification tag information is added to the read data packet to indicate the read command that the data is a result of, as well as the hub where the data was read. The identification tag information is utilized by the controller to match the read data packet to the read commands issued by the controller. By using the identification tag information, read data can be returned in an order that is different from the issue order of the corresponding read commands.

Exemplary embodiments also provide a preemptive data merge process to prevent data collisions on the upstream channel when implementing the indeterminate read data latency. A CCB is added to the hub device to temporarily store read data. When a memory device on the memory module reads data, the data is transferred from the memory interface to the buffer. When the hub device detects that an upstream data packet (i.e., a data packet being sent to the controller from a hub device that is downstream from the detecting hub device) is not in the middle of being transferred into the detecting hub device via an upstream channel (it typically takes several transfers to send the entire data packet), the detecting hub device checks to see if there is a read data packet in its CCB that is waiting to be sent upstream. If the hub device detects a read data packet in the CCB it drives the read data packet from the CCB onto the upstream data bus. In the meantime, if a new upstream data packet is received via the upstream data bus, the data packet is stored in the CCB on the hub device. In this manner, data packets coming upstream do not collide with data packets being sent upstream from the CCB on the hub device. In the case where there is more than one data packet in the CCB, a variety of methods may be implemented to determine which data packet to send next (e.g., the data packet from the oldest read command may be sent first).

Exemplary embodiments apply to memory systems constructed of one or more memory modules 110 that are connected to a memory controller 102 by a daisy chained memory channel 114 as depicted in FIG. 1. The memory modules 110 contain both a hub device 112 that buffers commands, address and data signals to and from the controller memory channel 114 as well as one or more memory devices 108 connected to the hub device 112. The downstream portion of the memory channel 114, the downstream channel 104, transmits write data and memory operation commands to the hub devices 112. The upstream portion of the controller channel 114, the upstream channel 106, returns requested read data (referred to herein as upstream data packets).

FIG. 2 depicts an alternate exemplary embodiment that includes a memory system constructed of one or more memory modules 110 connected to hub devices 112 that are further connected to a memory controller 102 by a daisy chained memory channel 114. In this embodiment, the hub device 112 is not located on the memory module 110; instead the hub device 112 is in communication with the memory module 110. As depicted in FIG. 2, the memory modules 110 may be in communication with the hub devices 112 via multi-drop connections and/or point-to-point connections. Other hardware configurations are possible, for example exemplary embodiments may utilize only a single level of daisy chained hub devices 112 and/or memory modules 110.

FIG. 3 depicts a hub device 112 with flow control logic 308 utilized by exemplary embodiments to perform the processing described herein. The hub device 112 and the components within the hub device 112 may be implemented in hardware and/or software. The hub device 112 receives upstream data packets on the upstream channel 104 via the receiver logic 304 (also referred to herein as an upstream receiver). The upstream data packets are data packets being sent to the controller 102 from a hub device 112 that is downstream from the receiving hub device 112. An upstream data packet can be sent to the driver logic 306 (also referred to herein as the upstream driver) to be driven towards the controller 102 on the upstream channel 106 or, if the upstream channel 106 is busy, the upstream data packet can be temporarily stored in the CCB 310 on the hub device 112. The destination of the upstream data packet is determined by the flow control logic 308 and implemented by sending a signal to the local data multiplexer 312.

In exemplary embodiments, CCBs 310, or buffer devices, reside in the hub device 112 and safely capture upstream data packet transfers (via the receiver logic 304) that are shunted into the CCB 310 while the hub device 112 is merging its local data packets onto the upstream channel 106. Local data packets are data packets that are read from memory devices 108 attached to the memory module 110 being directed by the hub device 112. These memory devices 108 are also referred to herein as local storage devices. The data read from the local storage devices, the local data packets, are formatted for return on an upstream controller interface via the upstream driver and stored in the CCB 310. The formatting includes serializing the local data packet into the proper frame format (e.g., see exemplary frame format depicted in FIG. 5), and inserting values into the identification tag (sourced from the read request), first transfer field, and bus cyclical redundancy code (CRC) field. In exemplary embodiments, the formatting of the local data packet is performed as part of storing the local data packet into the CCB 310.

When a data packet is received at the memory interface 302, it is stored into the CCB 310 while the local data packets are waiting to be merged onto the upstream channel 106 (via the driver logic 306). The identification tag within the data packet allows the memory controller 102 to correlate a returned read data packet with its corresponding read data request command. The data packet also contains a small, easy to decode ‘start’, or first transfer (‘ft’) field (also referred to herein as a frame start indicator) delivered near the beginning of an upstream read data frame (data packets are formatted as read data frames) which indicates that a read data frame is present in the data packet. This is used by the flow control logic 308 in the hub device 112 to monitor the channel read data activity.

When there is data in the CCBs 310 from either a local read operation or from a previously shunted read data packet from a downstream hub device (the data packets in the CCB are referred to herein as stored data packets), the hub device 112 will merge it onto the upstream channel 106 via the driver logic 306 as soon as it is allowed. The hub device 112 merges local data onto the upstream channel 106 whenever the upstream channel 106 is idle, or immediately following the last transfer of a data packet that is currently in progress. Read data frames will never be bisected using this method, but read data frames that are in flight on the upstream channel 106 that have not yet arrived at a hub device's 112 local data multiplexer 312 may be preempted and shunted into the CCB 310. This allows gaps in read data on the upstream channel 106 to be minimized which increases bus efficiency and results in reduced average read data latency under real world work load conditions.

When there are multiple read data packets present in the CCBs 310, the hub device 112 can be configured to send the read data packet corresponding to the earliest read command. This minimizes undue latency on read requests issued to hub devices 112 that are many daisy chain positions away from the memory controller 102. Other CCB 310 unload prioritization algorithms may also be implemented. For example, the identification tag field of the read data frame may contain a priority field. The priority field can be used to guide the unloading of the CCBs 310. Alternatively, priority information may be delivered as the read data is requested. Hub devices 112 can then compare the identification tag to previously recorded priority information to determine the location in the CCB 310 to send next. A method may also be employed that occasionally sends lower priority data before high priority data to ensure that low priority data is not completely stalled by requests that have been tagged with a higher priority.

FIG. 4 is a process flow that is facilitated by the flow control logic 308 located in the hub device 112 in exemplary embodiments. The process depicted in FIG. 4 performs preemptive local data merge and may be implemented by a mechanism including hardware and/or software instructions such as a finite state machine in the flow control logic 308. The process starts at block 402 and is repeated, in exemplary embodiments, on a periodic basis (e.g., after each controller channel transfer, or upstream channel cycle). At block 404 any local read data packets (i.e., from memory devices 108 on memory modules 110 attached to the hub device 112) in the memory interface 302 are loaded into the CCB 310. This insures that the flow control logic 308 is aware of and managing the upstream driving of local read data. At block 406, it is determined if there is data in the CCB 310. If there is no data in the CCB 310, then the data is routed from the receiver logic 304 to the driver logic 306 at block 412. The routing is directed by the flow control logic 308 by setting the local data multiplexer 312 to send the upstream data packet to the driver logic 306 for driving the upstream data packet onto the upstream channel 106 towards the controller 102. Processing then continues at 414, where processing is sent back to block 404 at the next upstream channel cycle.

If it is determined at block 406, that there is data in the CCB 310 then block 408 is performed to determine if an upstream channel operation is in process (i.e., is an upstream data packet or a local read data packet in the middle of being driven onto the upstream channel 106 via the driver logic 306). Processing continues at block 412 if an upstream channel operation is in process (i.e., the driver is busy). At block 412, upstream read data packets are routed from the receiver logic 304 to the driver logic 306 by setting the local data multiplexer 312 to send the upstream data packet to the driver logic 306. Alternatively, processing continues at block 410 if an upstream channel operation is not in process (i.e., the driver is idle) and there is data in the CCB 310. At block 410, data from the CCB 310 is driven onto the upstream channel 106 while any data packets received in the receiver logic 304 from the upstream channel 106 are shunted (stored) into the next available CCB 310 location. The shunting is performed by the flow control logic 308 directing the upstream data packets to be loaded into the CCB 310. Processing then continues at 414 which sends processing back to block 404 at the next upstream channel cycle.

FIG. 5 is an exemplary read data frame format for upstream data packets and local read data packets on the upstream channel 106. The frame format depicted in FIG. 5 uses twenty-one signal lanes and each packet includes sixteen transfers. It includes a one bit first start indicator 502 and an identification tag 504, as well as 256 bits (32 B) of read data 506 with a bus CRCs 508 for transmission error detection. Other combinations of signal lanes and transfer depths can be used to create frame formats that include a frame start indicator, read data identification tag and read data that are compatible with this invention.

Exemplary embodiments pertain to a computer memory system constructed of daisy chained hub logic devices connected to, or contained upon, memory modules. The hubs are daisy chained on a memory controller channel and are further attached to memory devices on the memory modules. The memory controller issues requests for read data to the hubs which merge this read data from the memory modules onto the memory channel. Using channel buffers and packet identification tags, the hubs are able to return read data at a time unpredicted by the memory controller, and at a time that may preempt a read request that had been issued earlier, without loosing or corrupting any of the read data returned on the channel to the memory controller.

Exemplary embodiments may be utilized to optimize average read data latency by more fully utilizing the upstream channel. Through the use of CCBs, read data frame formats with identification tags and a preemptive data merge technique, indeterminate read data latency may be performed to more fully utilize the controller channel.

As described above, the embodiments of the invention may be embodied in the form of computer-implemented processes and apparatuses for practicing those processes. Embodiments of the invention may also be embodied in the form of computer program code containing instructions embodied in tangible media, such as floppy diskettes, CD-ROMs, hard drives, or any other computer-readable storage medium, wherein, when the computer program code is loaded into and executed by a computer, the computer becomes an apparatus for practicing the invention. The present invention can also be embodied in the form of computer program code, for example, whether stored in a storage medium, loaded into and/or executed by a computer, or transmitted over some transmission medium, such as over electrical wiring or cabling, through fiber optics, or via electromagnetic radiation, wherein, when the computer program code is loaded into and executed by a computer, the computer becomes an apparatus for practicing the invention. When implemented on a general-purpose microprocessor, the computer program code segments configure the microprocessor to create specific logic circuits.

While the invention has been described with reference to exemplary embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted for elements thereof without departing from the scope of the invention. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the invention without departing from the essential scope thereof. Therefore, it is intended that the invention not be limited to the particular embodiment disclosed as the best mode contemplated for carrying out this invention, but that the invention will include all embodiments falling within the scope of the appended claims. Moreover, the use of the terms first, second, etc. do not denote any order or importance, but rather the terms first, second, etc. are used to distinguish one element from another. 

1. A memory controller for use in a memory system, the memory controller comprising: an upstream channel for receiving one or more read data packets at an unpredicted time from a downstream hub device, each packet in a frame format including a frame start indicator and an identification tag; and computer instructions for correlating the received data packets with their corresponding read data request commands using the identification tags included in the read data packets.
 2. The memory controller of claim 1 wherein each read data packet is received at an unpredicted time relative to its corresponding read request command.
 3. The memory controller of claim 1 wherein the read data packets are received in an unpredicted order relative to an issue order of their corresponding read request commands.
 4. The memory controller of claim 1 wherein the upstream channel is a daisy chain channel.
 5. The memory controller of claim 1 wherein the hub device is physically located on a memory module.
 6. The memory controller of claim 1 wherein the hub device includes a buffer device.
 7. A method for providing indeterminate read data latency at a memory controller, the method comprising: sending a memory data request from the memory controller to a memory module via a downstream bus, the memory module including a hub device; receiving and processing the memory data request on the memory module; sending a response data packet from the memory module to the memory controller via an upstream bus, the response data packet including a frame start indicator and an identification tag; determining if the response data packet has been received at the memory controller using at least the frame start indicator; and correlating the received response data packet with the memory data request based on the identification tag included in the response data packet.
 8. The method of claim 7 wherein the identification tag in the memory response data packet is sourced from the read request.
 9. The method of claim 7 wherein the receiving a response data packet at the memory controller occurs at an unpredicted time.
 10. The method of claim 7 wherein the receiving of a response data packet at the memory controller occurs in an unpredicted order relative to an issue order of the corresponding memory data request command and other memory data request commands issued by the memory controller.
 11. The method of claim 7 wherein the receiving and processing of the memory data request on the memory module includes: receiving the memory data request at the memory module; and servicing the memory data request at the memory module, resulting in the response data packet.
 12. The method of claim 7 wherein the sending a response data packet includes executing a local preemptive data merge algorithm a the hub device to minimize read data latency and to enable indeterminate read data return times to the memory controller.
 13. The method of claim 12 wherein the local preemptive data merge algorithm includes: determining if a local data packet has been received; if a local data packet has been received, then storing the local data packet into a buffer device; determining if the buffer device contains a data packet; determining if an upstream driver is idle; if the buffer device contains a data packet and the upstream driver is idle, then transmitting the data packet to the upstream driver; determining if an upstream data packet has been received; if an upstream data packet has been received and the upstream driver is not idle, then storing the upstream data packet into the buffer device; if an upstream data packet has been received and the buffer device does not contain a data packet and the upstream driver is idle, then transmitting the upstream data packet to the upstream driver; and continuing to transmit any data packets in progress if the upstream driver is not idle.
 14. The method of claim 13 wherein the determining if a local data packet has been received, the determining if the buffer device contains a data packet, the determining if an upstream driver is idle and the determining if a data packet has been received are performed on a periodic basis.
 15. The method of claim 14 wherein the periodic basis is once every upstream channel cycle.
 16. The method of claim 13 wherein the buffer device contains a plurality of data packets and the data packet is selected based on a prioritization algorithm.
 17. The method of claim 16 wherein the prioritization algorithm selects the data packet based on the age of the read instruction that corresponds to the data packet.
 18. The method of claim 17 wherein the prioritization algorithm selects the data packet based on a priority associated with the data packet.
 19. The method of claim 1 wherein the frame format further includes a bus cyclical redundancy code (CRC) field.
 20. A memory system comprising: a memory controller including; an upstream channel for receiving one or more read data packets at an unpredicted time; and computer instructions for correlating the received read data packets with their corresponding read data request commands using identification tags included in the read data packets; one or more memory modules with one or more memory devices connected to the memory controller by a daisy chained channel, wherein the read data is returned to the memory controller as the read data packets using a frame format that includes the identification tag and a frame start indicator; and one or more hub devices on the memory modules for buffering address, commands and data, the hub devices including controller channel buffers used in conjunction with a preemptive local data merge algorithm to minimize read data latency and enable indeterminate read data return times to the memory controller. 